Goto

Collaborating Authors

 positive bag








Deep Fragment Embeddings for Bidirectional Image Sentence Mapping

Andrej Karpathy, Armand Joulin, Li F. Fei-Fei

Neural Information Processing Systems

We introduce a model for bidirectional retrieval of images and sentences through a deep, multi-modal embedding of visual and natural language data. Unlike previous models that directly map images or sentences into a common embedding space, our model works on a finer level and embeds fragments of images (objects) and fragments of sentences (typed dependency tree relations) into a common space. We then introduce a structured max-margin objective that allows our model to explicitly associate these fragments across modalities. Extensive experimental evaluation shows that reasoning on both the global level of images and sentences and the finer level of their respective fragments improves performance on image-sentence retrieval tasks. Additionally, our model provides interpretable predictions for the image-sentence retrieval task since the inferred inter-modal alignment of fragments is explicit.


Convex Multiple-Instance Learning by Estimating Likelihood Ratio

Neural Information Processing Systems

We propose an approach to multiple-instance learning that reformulates the problem as a convex optimization on the likelihood ratio between the positive and the negative class for each training instance. This is casted as joint estimation of both a likelihood ratio predictor and the target (likelihood ratio variable) for instances. Theoretically, we prove a quantitative relationship between the risk estimated under the 0-1 classification loss, and under a loss function for likelihood ratio. It is shown that likelihood ratio estimation is generally a good surrogate for the 0-1 loss, and separates positive and negative instances well. The likelihood ratio estimates provide a ranking of instances within a bag and are used as input features to learn a linear classifier on bags of instances. Instance-level classification is achieved from the bag-level predictions and the individual likelihood ratios. Experiments on synthetic and real datasets demonstrate the competitiveness of the approach.


Deep Fragment Embeddings for Bidirectional Image Sentence Mapping

Neural Information Processing Systems

We introduce a model for bidirectional retrieval of images and sentences through a deep, multi-modal embedding of visual and natural language data. Unlike previous models that directly map images or sentences into a common embedding space, our model works on a finer level and embeds fragments of images (objects) and fragments of sentences (typed dependency tree relations) into a common space. We then introduce a structured max-margin objective that allows our model to explicitly associate these fragments across modalities. Extensive experimental evaluation shows that reasoning on both the global level of images and sentences and the finer level of their respective fragments improves performance on image-sentence retrieval tasks. Additionally, our model provides interpretable predictions for the image-sentence retrieval task since the inferred inter-modal alignment of fragments is explicit.


CAMEL2: Enhancing weakly supervised learning for histopathology images by incorporating the significance ratio

Xu, Gang, Wang, Shuhao, Zhao, Lingyu, Chen, Xiao, Wang, Tongwei, Wang, Lang, Luo, Zhenwei, Wang, Dahan, Zhang, Zewen, Liu, Aijun, Ba, Wei, Song, Zhigang, Shi, Huaiyin, Zhong, Dingrong, Ma, Jianpeng

arXiv.org Artificial Intelligence

Histopathology image analysis plays a crucial role in cancer diagnosis. However, training a clinically applicable segmentation algorithm requires pathologists to engage in labour-intensive labelling. In contrast, weakly supervised learning methods, which only require coarse-grained labels at the image level, can significantly reduce the labeling efforts. Unfortunately, while these methods perform reasonably well in slide-level prediction, their ability to locate cancerous regions, which is essential for many clinical applications, remains unsatisfactory. Previously, we proposed CAMEL, which achieves comparable results to those of fully supervised baselines in pixel-level segmentation. However, CAMEL requires 1,280x1,280 image-level binary annotations for positive WSIs. Here, we present CAMEL2, by introducing a threshold of the cancerous ratio for positive bags, it allows us to better utilize the information, consequently enabling us to scale up the image-level setting from 1,280x1,280 to 5,120x5,120 while maintaining the accuracy. Our results with various datasets, demonstrate that CAMEL2, with the help of 5,120x5,120 image-level binary annotations, which are easy to annotate, achieves comparable performance to that of a fully supervised baseline in both instance- and slide-level classifications.